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Abstract 

It is shown that the variable bandwidth density estimator proposed by McKay (1993a and 
b) following earlier findings by Abramson (1982) approximates density functions in C^(R'') 
at the minimax rate in the supremum norm over bounded sets where the preliminary den- 
sity estimates on which they are based are bounded away from zero. A somewhat more 
complicated estimator proposed by Jones McKay and Hu (1994) to approximate densities in 
C®(R) is also shown to attain minimax rates in sup norm over the same kind of sets. These 
estimators are strict probability densities. 

MSC 2010 subject classification: Primary: 62G07. 

Key words and phrases: kernel density estimator, variable bandwidth, clipping filter, square 
root law, sup-norm loss, spatial adaptation, rates of convergence. 

1 Introduction and statement of results 

Let Xi,i G N, be independent identically distributed (i.i.d.) observations with density function 
f{t), f £ M (to be replaced below by t e W'-). Setting to be a symmetric probability ker- 
nel satisfying some smoothness and differentiability properties, Abramson (1982) proposed the 
following 'ideal' or 'oracle' variable bandwidth kernel density estimator: 

n 

/A(t; ft„) - l^nHt, X,)K{h-^j{t, X,)(t - X,)), (1) 

i=l 

where, 7(t, s) = (/(s) V /(i)/10)^/^, which is made into a 'real' estimator by replacing / with 
a preliminary estimator. In words, in Abramson's estimator the window-width about each ob- 
servation Xi is inversely proportional to the square root of the density / at Xi unless f{Xi) 
is too small, with the modification of 7(i,Xi) for small values of f{Xi) preventing against the 
possibility that observations Xi very far away from t exert too much influence on the estimate of 
/(i). This estimator adapts to the local density of the data, and if the adaptation is adequate, 
which it is, it seems that it should do better than the usual 'fixed bandwidth' kernel density 
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estimator: in fact Abramson showed that, while the variance of his estimator is of the same 
order as that of the regular kernel density estimator, its bias is asymptotically of the order of /i^, 
assuming / has four uniformly continuous derivatives and f{t) 7^ (while the bias achieved by a 
symmetric non-negative kernel is of the order of only /i^). So, one has a non-negative estimator 
of the density that performs asymptotically as a kernel estimator based on a fourth order (hence, 
partly negative) kernel. However, this variable bandwidth estimator is not the density function 
of a true probability measure since the integral of hn) over t is not 1 -it would if 7 depended 
only on s-. Terrell and Scott (1992) and McKay (1993b) constructed different examples showing 
that the Abramson ideal estimator without the 'clipping filter' (/(t)/10)^/^ on /^/^(X;), 

n 

fHM{t; K) = h-'f/'{X,)K{h-'f/'{X,){t - X,)), 

i=l 

which is a true probability density, may have bias of order much larger than /i^ , and in fact their 
examples show that clipping is necessary for such bias reduction. Hall, Hu and Marron (1995) 
then proposed the ideal estimator 

fHHMit; K) = -^iZ^ {^-^f'\X,)] f^'\X,)Ii\t - < KB) (2) 

where i? is a fixed constant; see also Novak (1999) for a similar estimator. This estimator is 
non-negative and achieves the desired bias reduction but, like Abramson's, it does not integrate 
to 1. McKay (1993a and b) discovered a smooth clipping procedure which solves the problem 
of obtaining a non-negative ideal estimator that integrates to 1 and that has a bias of the order 
of hf^ for densities with four continuous derivatives. He used in ([l} a function 7(i, s) = 7(5) not 
dependent on t, of the form 

7(s) aifis)) c^VTW/c) := cp'/-\f{s)/c% (3) 

where the function p (or the function v) is at least four times differentiable and satisfies p{x) > 1 
for all X and p{x) = x for all x > to for some < io < 00, and < c < 00 is a fixed number (note 
p{x) = v^i^^fx) and while McKay (1993b) uses v we will use p for convenience in calculations 
later). Functions p with these properties will be denoted by clipping functions. Then, McKay's 
ideal estimator is 

n 

fMcKit; K) = h-^a{f{X,))K{h-^a{f{X,)){t - X,)). (4) 

4=1 

The bias reduction to hf^ is obtained uniformly over regions where f{t) is bounded away from 
zero (or, if one allows c in ([3]) to vary with /i„, uniformly in / G K). Note that 7 may depend on 
hn as well and still have f{t; hn) integrate to 1. So, one may ask if with a more general function 
7(5, h) one can achieve further bias reduction. McKay (1993b) and Jones, McKay and Hu (1994) 
show that using 7(5, h) = a(/(s))(l -I- /i^/3(s)), with a as above and a convenient function /3 that 
depends on /, /' and /", a bias of the order of hn can be achieved on densities that are six 
times differentiable. This new estimator may be much less practical than the previous one since, 
in order to implement it, one has to obtain preliminary estimates not only of / but also of its 
first two derivatives; moreover, these authors claim that preliminary simulations with the ideal 
estimators show only modest gains by this new estimator over (|4l). 

The McKay and Jones-McKay-Hu ideal estimators mentioned so far achieve bias reduction by 
adapting the bandwidth about each Xi to the size of f{Xi), with smooth clipping for small values 
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of f{Xi) and using kernels that are concentrated enough, in order to keep the estimators local, 
and moreover they are strict probability densities. Samiuddin and El-Sayyad (1990) achieved 
the same results by shifting the centers of the windows by random quantities. See Jones, McKay 
and Hu (1994) who show that, by combining the two methods one can obtain an infinite number 
of such estimators, the general form of their ideal counterparts being 

n 

f{t; K) = n-i h-^j{X,)K{h-^^{X,){t - X, - /i^r(X,))), (5) 

1=1 

where the functions 7 and T may depend on the bandwidth ft,„, the density function f{t) and its 
derivatives, and they considered 7(2) — a{f{z)){l + h\P{z)) and T{z) = A{z) + h'^B{z) where 
a, /3, A and B do not depend on but depend on / (and /3 and B on its derivatives as well). 
However, Jones, McKay and Hu (1994) argue that among these, the most practical is McKay's 
modification of Abramson's estimator based on (jH), followed, at a distance, by the one with F = 
and 7(2) — 7(2;, h) in ([5]) mentioned above, and we will pay attention only to these estimators in 
this article. 

The estimators H]) to ([5]) are usually called ideal estimators in the literature, and they give 
rise to true estimators / by replacement of the density and its derivatives in their formulas 
by preliminary kernel estimators, perhaps using different sequences of bandwidths and different 
kernels, as in ^ and (ITi)) below. The estimators f{t) are non- linear and it is difficult to measure 
their discrepancy from f{t). After Hall and Marron (1988), this task is divided into two parts, 
a) the study of the ideal estimator, and b) the study of the discrepancy between the ideal and 
the real estimators. 

The literature emphasizes the bias part of the ideal estimators, and on this one may say 
that the work of McKay (1993a and b) and Jones, McKay and Hu (1994) is final: the clipped 
estimators achieve bias reduction uniformly in regions where the density is bounded above from 
zero, and clipping is necessary for this reduction. Regarding the variance part of the ideal 
estimators, it is only shown in the literature that it is pointwise of the same order as the usual 
kernel density estimators, but there are no published results on the uniform closeness of the (ideal) 
estimator to its mean except for one in Gine and Sang (2010) for the estimator ^ of Hall, Hu 
and Marron (1995). The discrepancy between the ideal and the corresponding real estimators 
for estimators based on Abramson's square root law turns out to be exactly of the same order as 
the difference between the ideal and the true density /, not less, and this discrepancy was first 
considered in detail by Hall and Marron (1988) and Hall, Hu and Marron (1995), who proved that 
it is asymptotically of the order of pointwise and in probability for bounded densities with 

four bounded derivatives. McKay (1993b) adapted their method of proof and corrected some 
inaccuracies from Hall and Marron (1988) to show that this discrepancy for the multidimensional 
analogue of ^ is of the order of rt~^/(*+'^\ also pointwise and in probability, and for dimension 
d < 6. Gine and Sang (2010) show that, in the case of the Hall, Hu and Marron estimator and in 
dimension 1, the discrepancy is of the order of ((logn)/n)*/^ almost surely and uniformly over 
intervals where the preliminary estimator is bounded away from zero, as well as uniformly over 
densities with fixed but arbitrary bounds on their sup norm and the sup norms of their first four 
derivatives, thus obtaining a complete result on the uniform rate of approximation of the true 
density by the real estimator corresponding to ^ for d = 1. Several of our arguments simplified 
by undersmoothing the preliminary estimator. These rates are optimal. In this article we prove 
similar results, in R'^ and without undersmoothing, for the McKay (1993b) estimator based on 
the generalization of (|3]) to M.'^, for any d < 00, and also, but only in K, for the estimator with 
r = and 7(2, h) = a{f{z))/{l + h?l3{z)) in (O (these estimators can be handled in general and 
in M"*, but the details are cumbersome and the results might not be too practical, according to the 
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Jones-McKay-Hu (1994) study). In order to obtain these results we use empirical process theory, 
particularly and repeatedly, Talagrand's (1996) exponential inequality for empirical processes 
(which can often be replaced by a general result of Mason and Swanepoel (2010) that in addition 
yields uniformity in bandwidth) and, also at an important instance, an exponential inequality of 
Major (2006) for canonical JJ-processes, tools that were not available to previous authors, and 
that were introduced in density estimation respectively by Einmahl and Mason (2000) and Gine 
and Mason (2007). We now describe our results. 
Define 

Vr=Vrif):={teR'':fit)>r>toc'',\\t\\<l/r}, r > 0. (6) 

Here, c and to are the constants that appear in the clipping function 7 in ([3]). In this paper, we 
obtain the optimal almost sure rate of uniform convergence on the sets Vr for the two Jones- 
McKay-Hu estimators derived from Abramson's (1982) square root rule. For the first, with bias 
~ ^2 ni ideal estimator is the multidimensional version of (U), 

fMcKit; /i2.0 = ^ E ^ (^—^aifiX,))) a\f{X,)) (7) 

with 

a{x)=cp^''^{c''^x), x>Q. (8) 

It is the special case of ^ for F = and 7(t) — a{f{t)) independent of /i„. The estimator itself 
is 

1 " / 1 ~ X \ 
ht; h2,n) = -T^ E ^ -^-^a{f{Xf, h,,n j) a^ifiX,- h^,,)), (9) 

2,ri \ "•2:" / 

where /(x; /ii.„) is the classical kernel density estimator 

mh,^n)^-^yK('-^). (10) 



The following notation will be convenient: Vc will denote the set of all probability densities 
on R*^ that are uniformly continuous and are bounded by C < 00, and Vc,k will denote the 
set of densities on M'' for which themselves and their partial derivatives of order k or lower are 
bounded by C < cx) and are uniformly continuous. The dependence on the dimension d will be 
left implicit both for the regions T>r and for the sets of densities Vck- 
Here is our first theorem: 

Theorem 1 Assume that the kernel K on is non-negative, integrates to 1 and has the form 
K(t) = $(||i|P) for some real twice boundedly dijferentiable even function $ with support con- 
tained in [— r, T], T < 00. Let a{x) by defined by 0) for a nondecreasing clipping function p{s) 
(p{s) > 1 for all s and p{s) = s for all s > to > 1) with five bounded and uniformly continuous 
derivatives, and constant c > 0. Set /i2,„ = ((logn)/n)^/(^+'') and hi^n = ((logn)/n)"'^/'-^^''', 
n € N. Then, the estimator f{t\ /i2,n) given by (0) and ilO\) with the kernel, bandwidths and 
function a just described, satisfies 



sup 



//logn\^/(«+'^)\ 
) ~ f{t) \ = Oa.s. uniformly in f e VcA (H) 



for any C < 00. If'D"{f) is defined as 

D'rif) = {t ■■ fit; /ii,„) > 2r > ^oc^ \\t\\ < 1/r} , (12) 
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then we also have 



sup 



1 X 4/(8+d)N 

log n ^ ' ^ ' 



f{t;hi,n,h2,n)~f(t) = Oa.s. Uniformly in feVcA- (13) 



We should recall that, given measurable functions Znj{Xi, . . . , X„), Xi the coordinate func- 
tions of (M'')^, f € V, V a collection of densities on R'^, we say that the collection of random 
variables Z„ . . . , Xn) is asymptotically of the order of a„ uniformly in f Cz V if there exists 

C < oo such that 

lim sup(P/f (sup— |Z„./(Xi,...,X„)| >c]=0, 

k^oo ln>k O-n J 

where dPf{x) — f{x)dx, and that it is Oa.sX'^n) uniformly in / if this limit holds for every C > 0. 
In the text we will use Pry for (P/)^, or even Pr if / is understood from the context. 

We should note that whereas the bias of the ideal estimator is of the right order uniformly 
only over 1?^, both the variance part of the ideal estimator and the difference between ideal and 
real estimators are of the right order uniformly in . 

Here is an example of a five times differentiable clipping function p for which to ~ 2: 

{ l + |l(l-2(t-2) + |(t-2)2-|(i-2)3 + |(t-2)4) if0<t<2 
p{t) = { t if < > 2 . 

[ 1 if t < 

(We could as well take this formula as the definition of v, and set p{x) = as in ([3]).) This 

is based on McKay's (1993b) example of a four times differentiable clipping function. Other 
examples of such functions are possible, and in particular see McKay, loc. cit., for an infinitely 
differentiable one. 

In M, the ideal estimator with bias /ifj = /if „ that we will consider has the form 

1 " ft-X \ 

f.JKHit;h2^„) = Y,K{ ^-^Ih.JX,) -fH,JXr), (14) 

where 

7. M~ witha(f(x))-cpv^(c-v(x)) - i4i:Mm^mm 

^i--^'^^'')- l + hlM f(x)),P[x)- 24T2a6(/(a;)) 

(15) 

with 

= J K{x)\x\''dr, r > 0, 

f ([Ti)) is another special case of the general estimator ([5]) with F = 0, but with 7 depending on 
the bandwidth /i2,n)- The true estimator corresponding to (jl4l) is 

,nj ^2.n; ^3,n: ^4,n) 

7 K — -j{X.r, hi,n, h2.n, /i3,n, ^4,n) I j{Xi; /li,„, /l2.„, /l3.n, /l4.«): (16) 

nh2,n V "i.n ' 

' z— 1 ^ 

where 



7(a;; /i2,n, /i3,n, ^4,«) = — — ^^'"^ 0,(2;; hi^n) a{f{x; /ii,„)), 

1 + K,nP\Xi hl^n, h3^n,hi,n) 



5 



0(1. I, L ^ Ti[fG^{x]hi,n)f{x\hi,n) - 2(/g, (x; /t3,„))^] 

Here /(x; is the classical kernel density estimator ((TU]), and /d and are the estimators 
of /' and /" given respectively by 

= ,18) 

and 

/G.(-;^4.„) = ^EG"f:^i^), (19) 



where G is a fourth order kernel, that is, such that 

1 if j = 

z^G{z)dz ^ { if 1 sC j s$ 3 

a^O if j = 4 

We will sketch the proof of the following theorem for this estimator: 

Theorem 2 Assume the kernel K is as in Theorem\^ Assume that the fourth order kernel G is 
supported by [—Tq^Tq] for some Tq < oo, is twice continuously differentiahle, is symmetric about 
zero and integrates to 1. Let a{x) by defined by (0) for a nondecreasing clipping function p{s) 
(pis) ^ 1 for all s and p{s) = s for all s > to > I) with seven bounded and uniformly continuous 
derivatives, and constant c > 0, and let 7 and /? be as in U5\) . Set = ((logn)/n)^/^, 

h2,n = /J4,ri — {{log n) / ii)^ ^ and /i3_„ ~ ((logTi)/?!)^/-^^, n e N. Let 7, a, /? be as in (T7^ for 
these bandwidths, and let f be defined by il6]) with the kernels, bandwidths and function 7 just 
described. Then we have 



sup 



I /I ■ \^/-^^\ 

fit; hi hi,n) ~ f{t) \ = Oa.s. uniformly in f £ Vc\6- (20) 

Further, for the region defined in il'^) . we have 



sup 



1 \ 6/13N 

log n ^ ' 



fit; hi 

) - .f{t) \ = Oa.s. uniformly in f £ Vcfi- (21) 



It seems natural that Theorem [5] extends to several dimensions, like Theorem[TJ but the proof 
is already quite involved for d — 1 and the practical value of this estimator is not proven (cf. 
Jones, McKay and Hu (1994)). 



2 Bias and variance of the ideal estimator 

In this section we consider the ideal estimators fhicK and fKjH in several dimensions. We a) 
describe the bias reduction for the ideal estimators, mainly following McKay (1993a and b) (see 
also Jones, McKay and Hu (1994)), and b) show that the uniform rate of concentration of the 
ideal estimators about their means, not surprisingly, turn out to be the same as those of regular 
kernel density estimators in (Gine and Guillou (2002); Deheuvels (2000) in one dimension). 
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2.1 Uniform bias expansions 

Our ideal estimator is 



f{t■,hn)^fn{t■,h^)^n-^Y.^n''l^{X^)K{h-^Jh{X,){t-X,)), t E M." (22) 

i=l 

that is, we take F = in ([5]). Since "f = jh niay depend on h, in order to handle this carefully it 
is better to assume that 7 depends on another variable S and eventually have 5 = h: 

n 

/„(i;/j,<5) -7i-i^;i-'^7,^(X,)if(/i-^75(^.)(i-^0), ieK''- (23) 

i=l 

The general case ^ can be considered in exactly the same way, but formulas become more 
complicated. 

The following proposition and its proof are contained in McKay (1993b, Theorems 2.10, 1.1 
and 5.13) (see also Hall (1990), and particularly Jones, McKay and Hu (1994, Theorem A.l) and 
McKay (1993b)). We sketch these authors' proof in the case d — 1 for the reader's convenience. 

Notation: we say that a function g is in C'(ri) if itself and its first I derivatives are bounded 
and uniformly continuous on Q. 

More notation: for v = {vi, . . . ,Vd) e N U {0}Y, we set = J2i=i "^i^ := D^l o ■ ■ ■ o Dl"^, 
v\ = vi \ ■ ■ ■ Vd\ and t„ = /^^ u^^ ■ ■ ■ u'"/K{u)du. 

Proposition 1 (McKay (1993a, b)) Let the kernel if : M'' !—> M 6e symmetric about zero sepa- 
rately in each coordinate, have bounded support and integrate to 1. Assume the density f is in 
C'(R''). Assume jsit) > c > for some c > and all t G M.'^ and < 6 < Sq, for some Sq > 0, 
and that the function 'y{t,S) := jsit) is in C'"'"^(M'' x [0, i5o]). Then we have 

I 

S/„(t;/i,<5) = ^aM(t)/»'+o(/i') (24) 

k=0 

as ft, — > 0, uniformly in t G and < S < Si for some Si > 0, and the set of functions ak,s, 
which are uniformly bounded and equicontinuous, are defined as 

a2k+iAt) = 0, a^kAt) = E (:0m) ' (2^) 

\v\=2k W/ 

for k < 1/2, in particular, ao^s{t) = f(t). 

Proof. (For d — I.) The difference between the proof of this proposition in dimension 1 and in 
dimension d > 1 is that in dimension 1 we can use positivity of the derivatives of a certain function 
in order to partially invert it, whereas in the case of M.'^ we need to use the implicit function 
theorem. We refer to Lemma 2.11 in McKay (1993b) for the details in any dimensions, but as 
mentioned above, we only consider here the case d = 1. Since the functions 75 are bounded away 
from zero and their derivatives are bounded, there exists Si > such that 75 (i — v) — vjg(t — v) 
is bounded away from zero for alH G M, (5 G [0, Sq], and v £ [—Si,5i]. Hence, for each t G M and 
< S < So, the function v 1— >■ J7t,5(w) '■— vjsif — v) is invcrtible on the neighborhood [— (5i,^i] 
of w = 0. These inverse functions, say Vt^siu), are I + 1 times differentiable with continuous 
derivatives, with respect to the three variables (this can be seen directly by differentiation, or 
using the implicit function theorem as in McKay (1993b) Theorem 2.10 and Lemma 2.11 for 
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X = it,S)). If the support of K is [-T,T] then K{h-^js{s){t - s)) = unless \t - s\ < hT/c. 
This imphes that the change of variables 

hz — (t — s)js{t ^ {t ^ s)), that is i — s = Vt^sihz), 

in the following integral is valid for all h small enough 

Ef{t;h,6) = l.Jj,{s)f{s)K(^^^s{s)^ds 

jsit - Vt.s{hz))f{t - VtAhz))^^^0^K{z)dz. 

Now, the first statement in the proposition follows by developing the function js{t—Vt,s{hz))f{t— 
Vt^sihz)) ''^(^^J)^"' into powers of hz and integrating, on account of the compactness of the domain 
of integration (z G [—T,T]) and the differentiability properties of / and 75. (Note that the 
presence of dV{hz)/d{hz) in the integrand requires that the function V be /+! times differentiable 
in order to obtain differentiability of the integrand up to the ^-th order, necessary for ([Ml) .) 

Let ?A be an infinitely differentiable function of bounded support. Then, changing variables 
{t = s + hu), developing ip, changing variables once more {w = 1x7^(5))) and integrating by parts, 
we obtain 



4>{t)Ef{t;h,S)dt = J J i;{s + hu)js{s)fis)K{u'ys{s))dsdu 

= E ^ / ^^'^ i'hs{s)f{s) (^j u^K{u^s{s))dv^ ds + o{h') 



k=0 



k=0 



and note that, by symmetry, = if fc is odd. But by ((24|) . 

ij{t)Ef{t] h, S)dt ^^h'' j ^:{t)ak,s{t)dt + o(/i'). 



fc=0 



and ([25]) follows by comparing the coefficients of U in both expansions. ■ 

With a slightly less simple proof, one can replace the bounded support hypothesis on K by 
/(I + \x\^)K{x)dx < 00, as done in the above mentioned references. 

Corollary 1 (McKay (1993a, b)) Let f be a density in C'^{M.'^), let p be a clipping function in 
C^{R), set a{f{t)) — cp^^'^{c^'^ f{t)) for some c > 0, and define f(t,h) by equation (2^ with 
7(5) = a{f{s)), that is f{t;h) — fMcK{t;h) (see LetV^ he as in Then, 

EfMcKit; h) = f{t) + [ T,D.,il/f)/vl I + o{h^) = f{t) + O (h^) (26) 

as h ^ 0, uniformly on Vr- 
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Proof. For x € Vr, j{t) = cp^/'^{c''^ f(t)) = /^/^(i), so that, by equation (gl]), 02(0;) = on Vr- 
So, the corollary follows from the previous proposition. ■ 

Corollary 2 (McKay (1993a), Jones, McKay and Hu (1994)) Let f he a density in C^(M). Let 
p he a clipping function in C''(R) and, for some c > 0, set a{f{t)) — cp^/'^{c^'^ fit)) and 

^ T,[f"{t)f{t)-2U'it))^] 
' ~ 2AT2a^t) 

Define 

and, consider fn(t;h,h), the estimator defined by i23\) with this 75 and with S — h, that is 
fn{t', h, h) — fjKH{t, h) [seeMb]]. Let Vr be as in (0) for dimension d = 1. Then, 



EfjKH{t-h) = fit) + h^{]^r,{P^y\t) + \T,(^^^^'\t) + — 





as h ^ 0, uniformly on T>r. 

Proof. By definition, a{f{t)) = f^'\t) and m = "^^^"^l^^f^-^g/'^'^^'^ - -53^ ())" (t) on P„ 
and the corollary follows by direct application of the previous proposition. ■ 

The estimator for I = 6 here is similar but slightly different from the one in Section 4.1 of 
Jones, McKay and Hu (1994) (they defined 75 = a(l + /i^/3), but, it is somewhat more convenient 
for us to define it by equation ([27])). 



2.2 Rate of uniform deviation from the mean 

The next proposition and its first proof have its origin in a result of Gine and Guillou (2002) 
that obtains the almost sure exact discrepancy rate between kernel density estimators in ffi.'' and 
their expected values, uniformly on the whole space. A proposition closer to the one below was 
proved in Gine and Sang (2010) for the Hall-Hu-Marron estimator, and Mason and Swanapoel 
(2010) (see also Mason (2010)) proved a general theorem that also yields the result, even with 
uniformity in bandwidth. Whereas the results in these references all imply (by exact analogy 
or as a direct consequence) the proposition below, we should emphasize that its proof consists 
of nothing but a direct and straightforward application of the famous Talagrand's exponential 
inequality, in the version in Einmahl and Mason (2000, inequality A.l combined with Proposition 
A.l), and in Gine and Guillou (2001, Proposition 2.2; 2002, Corollary 2.2), which turns out to 
be as well the main component in the proofs of all the above mentioned results. 

Assumptions 1 The sequence hn has the form 

hn^{{\ogn)/n)-^ (29) 

for some < < 1. The kernel K has the form K(t) ~ $(||t||^), where $ is hounded, has 
support on [0, T] for some T < co and is of hounded variation and left or right continuous, f 
is a bounded density function, and Jh{x) — a{x)/{l + h'^l3{x)), where the functions a and j3 are 
continuous and bounded, a is hounded away from zero, and < h< l/(2||/3||oo)^/^ The ideal 
estimator f{t;hn) — fn{t) is defined, with these kernel, function 7/1 and bandwidths hn, as in 
Ml- 
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All these assumptions can be weakened, but this is all we need in this article. 

We recall that iV(/C, d, e), the e covering number of the metric or pseudo-metric space {IC,d), 
is defined as the smallest number of (open) d-balls of radius not exceeding e needed to cover IC. 
Also, a collection of measurable functions /C on a measurable space {S, S) is of VC type relative 
to an envelope F (a measurable function F such that F{s) > |/(s)| for all s S 5 and / G /C) if 
there exist finite constants A, v such that, for all probability measures Q on {S,S), 

N{IC,L2{Q),e)< (" ^"-^"^^(Q) ") , 0<£<2 sup ||/-g|U,(Q). (30) 

All but one among the classes of functions that we will consider in this article can be shown to 
be of VC type using the following lemma, which is a variation on Lemma 4 of Gine and Sang 
(2010), and whose idea comes from Nolan and Pollard (1987) (inexcusably, we failed to mention 
this article in Gine and Sang (2010)). 

Lemma 1 Let K, f andjh satisfy Assumptions]^ Let Q be a uniformly bounded VC type class 
of measurable functions on R'^ with respect to a constant envelope G and admitting constants Ai, 
vi in equation i30\) . Let K, be the class of functions 



K = IK 



(^7/.(-)) : i e < < l/(2|l/3|U)i/2, g g cjj . (31) 



.(.,..(g),.),fM^)™- (32) 



Then, there exists a universal constant R such that for every Borel probability measure Q on M'^ 

( 

where ||$||y is the total variation norm of ^, that is, K, is a bounded class of functions of VC type 
with envelope \\^\\vG and admitting characteristic constants A = Ry Ai and v = %d + 20 + vi, 
independent of f . 

Proof. By adding an arbitrarily small strictly increasing function to the positive and negative 
variation functions of we have $ = $1 — $2 with $i strictly increasing, positive and bounded, 
with II $1 II 00 (II $2 II 00) arbitrarily close to the positive (negative) variation of For i = 1,2, let 
ICi be the classes of functions 

2 



/C, - (^7^7'.(-)) : t e R^O < /i< ooj 



Then the subgraphs of the functions in the class ICi have the form 

{x,u) : > ^ |(_^^^) . _ ci>~i(«)(l + ep[:,)f > o| , 

and so they are the positivity sets of functions from the linear space of functions of the two 
variables u and x spanned by a^{x), Xja^{x) for j = 1, . . . , d, a;|a^(x) for j = 1, . . . , d, $^^(u), 
^~^{u)f3{x) and ^~^{u)l3'^{x). Hence, by a result of Dudley (1978), the subgraphs of A^i are VG 
of index 2d + 5. Therefore, by the Dudley-Pollard entropy bound for VG-subgraph classes, e.g. 
Pollard (1984), we have 

i?ll<i>.||oo^'''+^'' 



N{IC,,L2{Q),s) < " , 0<£<||$,||oo, « = 1,2, (33) 
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where i? is a universal constant. 

Now, any H € K. can be written as H 
probability measure Q we have 



kig — k2g for ki G ICi and <? G ^, so that, for any 



Q{H-Hf = Q{{ki-k2)9-{h-~k2)9f 

< AG^Q{ki-Kf +AG^Q{k2-k2f 



2\\K\\lQ{g--gf 



Given e > let = e/(-\/l2G) and 82 = ^/{'^\\K\\v)- Then, if the collections of func- 
tions k^\ . . . , fc^^^ and k'f'\ . . . , fc^] are L2{Q) (5i-dense respectively in the classes /Ci, /C2, and 
gi, . ■ . , gNa SLie L2{Q) 52-dense in the class J^, with optimal cardinalities Ni = N{]Ci, L2{Q), ^i), 
i = 1,2, and = N{Q, L2{Q), ^2), then, by the previous inequality, the functions {k'^'' — k'^p)gi 
are L2{Q) e-dense in F . Since there are at most N1N2N3 such functions (this estimate may not 
be optimal), the inequality ([5^ follows. ■ 

This lemma is important for us because it allows direct application of the theorem in Mason 
and Swanepoel (2010)) or, what is more natural in our case, direct use of a version of Talagrand's 
(1996) inequality e.g. in the form given in Einmahl and Mason (2000) or in Gine and Guillou 
(2001, 2002), to the effect that, if P is a probability measure on a measurable space {S,S) and 
Xi : S are the coordinate functions of S^"^, which are i.i.d. with law P, and if a class of 

functions is bounded, countable and of VC type for an envelope F, then there exist < Ci < 00, 
1 < i < 2, depending only on v and A such that, for all A > 1 V 2Ci and all t satisfying 



CiV^aWlogffl^ 



< t < 



F 



we have 




C2\na^ J 



(34) 



(35) 



J2i9iX^)~Pg) >t <C2exp 

i=l J 

ll-Flloo ><J^> supVarp(g). 
The class /C is not countable, but the continuity properties of the functions defining it imply that 



where 



the sup over g G /C of 



Elli9ix^) - Pg) 



is in fact a countable supremum. Whenever this will 



happen in this article we will say that the class is measurable. 
Proposition 2 Under the hypotheses in Assumptions]^ 



sup \f{t; h,,) - Ef{t; /i„)| = ||/„ - ^/„|U = Oa.s. 

uniformly over all densities f such that ||/||oo < C, for any < C < 00, that is, there exists 
L < 00 such that, if Vc is the set of these densities, then 




lim sup Pry 




Efn\ 
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Proof. We have: 



Pr. 



log hn 



T 1 1 /» ~ ^fn 1 1 oo > A 



Pr < sup 

-EK 



E 

4=1 



K ( ^^7/.(^.) ) lt{X.) (36) 



for any A > 0. By Lenima[TJ for < ^ < l/(2|l/3|loo)^^^, the subclasses 

't- ■ 



Th = K 



-7/.(-)) <{■)■■ t^M.'' 



(37) 



are measurable VC classes of functions with respect to the envelope U = 2'^\\K\\v\\ct\\iaj ^nd 
admitting constants A and v independent of h: notice that the class of functions Q — {^'^ : 
< h < l/(2||/3||oo)^/^}, is bounded by 2'^||a||^ and is is clearly of VC type with v — I since 

T+^W))' " (tW^)'' ^ d2^^'M\ioi\\f^\\oo)'^^\h - h\; hence, the class of functions /C 

defined as in (j3ip using this G and the kernel K and the functions 7^ from this proposition, is 
VC by Lemma[T]and, since it contains the classes J-^, so are these classes, with the same A and 
V as /C. The continuity properties of K and 7^ imply they are measurable. Hence, Talagrand's 
inequality (1351) applies to the classes (1371) , and in order to apply it we only need a sensible bound 



(7,j for the maximum variance of the functions in each class J-fi. Since < ly := (2/3) inf^; a(x) < 



7/1 1 



< 2||a||oo < 00, we have 



t — X 



j{x) ] ^^\x)f{x)dx 



h" I K'^ {uj{t ~ hu)) j^'^it - hu)f{t - hu)du 



\2did 



(38) 



So, assuming / e Vc, we can take cr^ := C{K,a)Ch'^ with C(ii', a) = 2'^{2T^'^ / vfWKWl^WaW^^. 
Take now h = h„. The envelope Uh„ of J^h„ can be taken to be the constant U above, hence it is 
eventually much larger than cr/i„ and, by (f^ . we also have ^Jnuh^ ^J\og{U / (Jh„) « ^^h^ (here 
and elsewhere, the sign << should be read as 'of smaller order than' when the indexing variable, 
in this case k, tends to infinity). If Ci and C2 are the constants in Talagrand's inequality psp 
common to all the classes J^^, it is then clear that there is n > np, riQ large enough, so that there 
exists A > such that simultaneously. 



CiVncT/i . /log^^^ < \\l nh^ logn << 



(39) 



for all n > no, and A > y/ C2C{K, a)C (note that \og{AU /ah„ is of the order of a constant times 
logn). Then, applying Talagrand's inequality psp to the empirical process in (1361) gives 



sup Pr 




Wfn - EfnWao > A i < Ca ^ CXp 



A log n 
'C2C{K, a)C 



< 00. 



(40) 



This proves the proposition. ■ 
Here is an alternative proof: 
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Proof. The facts that the union of the classes J'h, < h < l/(2||/3||oo)^/^ is VC bounded 
and measurable, and that inequality ([55)) holds, verify that the class of functions {K{{t — 
•)/i~^7(-))7''(-) : t < h < l/(2||/3||oo)^/^} satisfies the hypotheses of the general theorem 

in Mason and Swanepoel (2010), and their result then implies a stronger version of Proposition [2] 
with hn replaced by h and with uniformity in h within a range that includes hn ■ Their theorem 
is not stated with uniformity in / G Vc, but the inequalities used in their proof imply it. ■ 

Corollary [1] in Subsection 2.1 shows that the bias of the ideal estimator fMcK{t;h2,n) from 
d?]) is of the order of /i| „ uniformly in t € 2?^ and in / G PcA^ Proposition [3] (with (3 = 
in the definition of jh) gives that the uniform deviation from its mean, sup^gRd \ fMcK{t; ft.2,n) ~ 

EfAicKit; /i2,n)|, has order Oa.s. {\/^^) uniformly in t G E"^ and in f eVc (any < C < oo). 

Hence, they are of the same order for ft,2,„ = ((logn)/n)^/'^*+'^', and we have 

sup \fMcK(t; h2,n) ^ /(<)| = Oa.s. {{{logn) / nf Uniformly in / G Vca- (41) 

Likewise, Corollary [2] and Proposition [2] give that, for ft,2,n = ((logrt)/n)^/^'^ 

sup \ fjKH{t- h2,n) - fit)\ = Oa.s. ( {{log n) / nf ^ 'A Uniformly in / G Vc,6- (42) 
tev,. ^ ' 

It is this balance between the bias and the random centered (variance) components of the 
difference fMcKit] /i2.n) — fit) (or fjKH{t] ^2,n) — /(O) that prevents us from taking advantage 
of the uniformity in bandwidth in the Mason-Swanepoel (2010) theorem. Note also that applying 
this result as in the second proof of Proposition [2] and applying Talagrand's inequality psp as in 
its first proof, both require checking exactly the same facts (namely, measurability, boundedness 
and VC character of a class of functions, plus a bound for the variances of the functions in the 
class). For these two reasons we continue using Talagrand's inequality in the few instances below 
where either works. 

3 Estimation of densities in C^{W^): Proof of Theorem [I 

In this section we develop the proof of Theorem [1] The pattern of proof is similar to that of 
the main results in Hall and Marron (1988), corrected in Hall, Hu and Marron (1995), and 
particularly in Cine and Sang (2010), but the details are quite different. 

We make the following assumptions on the kernel K , the clipping function p, the densities / 
and the bandsequences: 

Assumptions 2 The kernel K is assumed to satisfy all the conditions in Proposition [7] and 
Assumptions{Ji and to have, besides, uniformly bounded second order partial derivatives. We also 
assume that the densities f are bounded and have at least four bounded and uniformly continuous 
derivatives, that is, f G Vc,4 for some C < oo. The nondecreasing clipping function p : M — > M 
is assumed to have two bounded derivatives, p{s) > 1 for all s and p{s) = s for all s > to > 1. 
Here c and to are fixed constants. We set = ((logn)/n)^/('*+'^) and /i2,n = ((logn)/n)^/(*+''^, 
n€N. 

Following Hall and Marron (1988), we compare the ideal estimator (O with the real one ([9]), 
with some changes similar to the ones introduced in Gine and Sang (2010), particularly, the 
use of inequalities from empirical and t^-processes. Proving the theorem in R'' for any d > 
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requires more precision than in dimension 1, in particular we cannot undersniooth the prehminary 
estimator and we must proceed differently with several estimations. 

By in order to obtain a uniform convergence rate of ((logn)/n)^/(®+'^) for teh difference 
between the true estimator ([9]) and the density f{t) we only need to show that the uniform 
convergence rate of the difference between the true estimator © and the ideal estimator ([7]) is 
at most of this order. 

Recall a{t) := cp^^^{c^^t). Define 6{t) — 6{t,n) by the equation 

^ c.(/(t;fei,„))-a(/(t)) ^ p(c-V"(t;fei,„))-p(c-V(t)) 

"(/W) pl/2(c-2/W)b^/'(c-V"(t;^l,n))+pl/2(c-2/(t))]' 

so that 

aif{t;h,^^))^aifmi + 5{t)). (44) 
Since p is a Lipschitz function and p > 1, 

\S{t)\<Bc-'\f{t;hi,,)~f{t)\ (45) 

for a constant B that depends only on p. Set 

D{t; /ii,„) = fit; hi^n) - Ef{t; /ii,„) and 6(i; /ii,„) = Ef{t; /ii,„) - /(t) 

and note that 

\\D{-; /ii,„)||oo = Oa.s. (^^/^^^^ uniformly in feVc (46) 
for all < C < cx) by a result of Gine and Guillou (2002), and that 

\\b{-; /ii,„)||oo = Oa.s.(^?,„) uniformly in / G Vc,2 (47) 
by the classical bias computation for symmetric kernels. Then we have, by P5|) . pS)) and (H71) . 

sup |(5(t)| = Oa.s.(/i?,„) = Oa...(l) uniformly in / G 75^2. (48) 

We also have, for further use, 

a(/(i;/ii,„))-«(/(i)) 



5(t) 



a'(/(t))[/(t; /ii,„) - /(O] , a"{v)[f{t; /^i,„) - /(i)]' 



2a(/(t)) 



(49) 



where 7y = ri{t) > is between f{t; /ii_„) and f{t) (so, not only it depends on t but also on / and 
on the whole sample. Note that, since p > I and p' and p" are uniformly bounded on [0, oo), we 
have \a"{ri{t, /ii,n))| £ c~'^A for some constant A that does not depend on or t but only on p. 
It is convenient as well to record the following expansion of a'^if) implied by and (05]): 

a^ifit; /ii,„)) - a'(/(0)(l + d6{t)) + 6i{t) (50) 

with 

||<5i||oc = Oa.s.(||<5||L) uniformly in / e 7^0.2, (51) 



14 



hence, by (gS]) and 

Il^illoo = Oa.s.(||/„(-;/ii,n) - /(OIIL) uniformly in / G Vc.2- 
For the kernel K we have the expansion 



(52) 



K 



t~ X. 



-a{f{X,-h^,n)) 



K 



t-X. 



I'2.n 



with 



, ft~x. 



{t - X,), 



(53) 

a{f{X,))5{Xi)+52{t-X,) 



{t X)j{t •^)^ ^2(^(^))^2(^) 



where ^ is a (random) point in the line connecting the points ^j^^a{f{Xi)) and ij^^a{f{Xi)) + 
a{f{Xi))5{Xi)^ as before. X having compact support, a being bounded from below by c 
(and above on bounded sets) and 5 satisfying (|48| and (|45)) . we get that, for each n, on the set 
where |l/„(.; /ii,„) - /(OHL < cV(2S) (so, ||,5||oo < 1/2), 



\52{t,x)\ < ^Mk(2Ti/Vc)252(^)/(||i_^|l <2Ti/2c-i/.2,„)- 



in particular. 



sup Mt,x)\ = Oa.s. (||/„(-;/ii,n) - /(OIIL) uniformly in / G Pc,: 



Set 



Li{t) =^UK'^{t) and L{t) = dK{t) + Li{t), teW^, 



(54) 



(55) 



(56) 



and notice that by symmetry, integration by parts gives that i is a second order kernel, as in 
dimension 1 (ifj denotes the partial derivative of K in he direction of the i-th coordinate, and 
ti dentoes the «-the coordinate of t € W^). The decompositions (HU, ([50]) and ([55]) then give: 



n 



i=l 



E 



t-x, 

h2,n 

t-X. 



a{f{X,))j a\f{XmX,) 
a{f(X,))\ <5i(X0 + a''{f{Xi))52{t,X{) 



(57) 



dLi 



t-X, 



T'2,n 



aifiX,)) aHf{X,))5^X{) 



(58) 



t-X, 



a{f{Xi))] 5{XMX,) + da''{f{Xi))5{Xi)62{t,X,) 



-J2^2{t,X,)diiX,) 



(59) 
(60) 
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The sums ([55)1 - ([SO)) are of lower and decreasing order, and will be dealt with first. Here is an 
elementary observation that will be useful: if for all n > r we have that Un ^ Vn whenever 
Wn < a, then for all a and for k > r, 



Pr <j sup Un>a}<Pri sup K > « ^ + Pr < sup Wn>a}. 

%>k J ln>fe J ln>fc 



We will use this in conjunction with the fact that there is a < cxi such that 

as k ^ oo. 



sup Prf < sup ,, , 

fevc.2 I n>k V log" 



||/(i;/ii,n)-/||oo >a 



(61) 



(62) 



which follows from P5|) and (|T7)) since for our value of hi^n, we have hf „ = (logn)/ {nhf „). Let 
us consider the first term from ([551) : by the previous observation 



Pr sup 



1 



1 



< Pr sup 



^>fc 11112 n 



E 

^1 

n 

E 



K\K^a{f{X,))]5,{Xi) 



K 



t-x. 



> a 



(63) 



> 



a/6 +Pr ( max||/iif^5i|loo > b] , 

I \7i>k 1 



and the last term, for suitable h < oo, converges to zero uniformly in / G Vc,2 as fc — oo by (j62p 
and (1521) . Now, by change of variables, for any r > 0, 



E 



( i^a{f{X^)) 



<C{K,r)\\f\\^hl^ 



for some finite constant C{K,r), so that, for a suitable constant m, the first term at the right 
hand side of is bounded by 



^Pr 

n—k 



1 



E 



( LJ^a{f{X,)) 



> a/6 — TO . 



Here we can use Talagrand's inequality ((35)) (if a class of functions is VC type, so is the class of its 
absolute values, by direct computation of L2 distances), which, by the second moment estimate 
above (r = 2) and boundedness of K, and for suitable a (in particular making a/b — m > 0), 

shows that this series is dominated, uniformly if / S Vc, by J2n>k^~^^'^^'" ^'-'^ some t] > 0, 
which tends to zero. (Note that we are using Talagrand's inequality for t in the upper limit of its 
domain (|M)). whereas typically one uses it for t in its lower limit.) Thus, we have proved that, 
uniformly in / e Vc,2, 



1 " 

— — sup y 



K[^—^a(f{X^:))\5,{X,) 



.(/.!„) = oa... ((logn)/n)4/(«^ 



d) 



Basically, what Talagrand and the simple observation (I5T)) do is to show that the order of the first 
term in ()58p is at most the order of ||(5i||oo multiplied by the order of the sup of the expectations 
of the (absolute values of) the summands without 5i{Xi). We can likewise follow this pattern 
of proof and get similar results for all the terms in (|55)) - ()5(I)) . One has to use that the classes of 
functions {Li (^^a(/(-)) ■.t(^W^,h>Q] and {l{\\t - -jl < 2T^/^c-^h) : i e M'^, /i > 0} are VC 
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type by Lemma [T] (the class of indicator functions is needed in order to handle the three terms 
in (UHll-dnni) that contain 62)- We then get 

sup \m\ = 0,.s.{hlJ, sup \m\ = Oa.s.(/i?,J, sup KEnil = Oa.s.(/i?,„) 

(64) 

uniformly in / G 7^0,2- 

The estimation of (j57l) is much more difficult. We decompose into several pieces using first 
the expansion (j49p of S, and then the decomposition of / — / into variance D and bias b, as 
follows: 

Notice that the term (|67| is very similar to the terms in ((58)) . and it has clearly the same order 
(recaU (gS])), that is 

sup I (1171)1 = Oa.s.(/it„) uniformly in rc,2- (68) 

We devote two subsections to the estimation of the remaining two terms, we anticipate that the 
main term is ()65p. 

3.1. Estimation of the bias term (1661). Consider the classes of functions 



Qn := S^Qix) = L (L-^a{f{xj)^ (a'^)'(/(x))6(x; /ii,„) : t e R-^j . (69) 

Recah that L{t) = J2^=i UK[ + dK{t) and that K{t) = $(|lt|p), $ twice boundedly differentiable 
and with bounded support. Hence, Lif) = 2||t||2$'(||t|j2) + rf$(||t|p). Since the function ■u$'('u) + 
(i$(u) is of bounded variation and bounded, the kernel L satisfies the hypotheses of the kernel 
K in Lemma [T] (with s — 2). So, these classes conform, for each n, to Lemma[T]with Q the class 
consisting of the single function (a'^)'(/(a;))6(a;, /ii,„), which, by (|T7)) . is uniformly bounded by 
M{c,p, C)h\ n'li f € Vc,2i for some constant M depending only on c, p and C. We conclude by 
that lemma that they are VC each with with envelope M{c,p, C, K)h\ „ for some other constant 
depending on the stated objects, and all with the same characteristic constants A and v. Since 
the continuity hypotheses make these classes measurable, this will allow us to apply Talagrand's 
inequality. If we set 

Q^{t) = L li^a{f{X,))^ {a''y{f{XmX^■M,n) 
it then follows, by the bound (j47p on 6, by boundedness and bounded support of L, by bound- 
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edness of p' and p > 1, that, for all t and all / e 7^c.2, 



sup EQ^,{t) 



< 



< 



and similarly, 



We then have 



sup 



||6(-;/ii,„)||Lll("')'llLll/lloo4„sup / L'{ua{f{t^uh2,n)))du 
M{C,c,p,K)hl^hl„, 



sup \Q,{t)\ < M{c,p,C,K)hl 



< sup 



E 



[Q^{t) - EQ,{t)] 



and Talagrand's inequality (|35|) . with cr^ 
gives that, for some D, L > 1, 



M{C,c,p,K)htM,, 



f sup —\EQi{t)\ 
and = 2M{c,p,C,K)hl, 



sup Pr/ < sup 



> DWn/i|„ft^„logn > < 6*2 ^exp (-Llogn) < oo. 



Since y n/if ^/ij n log jt./ (?T-ft-2 n) << ((log /"•)''^ (s+i^) ^ the term will be at most of the order 

of ((logn)/n)^/(^+'^) only if the expectation term sup,pjjd j4—\EQi{t)\ is of this order or smaller 

(uniformly in t € M'' and / € Vc,2)- The obvious bound for E\Qi{t)\, that one obtains just like 
the bound above for EQ\{t), is of the order of /if „ft.2 nj which then gives an order of h\ „ for the 
term This is not good enough, although it would be if we undersmoothed the preliminary 

estimator a little by taking = ((logn)/n)2/(8+'^) instead of /ii,„ = ((logn)/n)i/(*+''): this 
works, and in fact we made this choice in Gine and Sang (2010) on a related problem and d = 1, 
however, some extra work along the lines suggested by Hall and Marron (1988) will allow us 
to prove the right rate for (1551) with the optimal /ii^„, as follows. In the setting of the proof 
of Proposition [TJ but in dimension d, the inverse function theorem yields the existence and 
differentiability of Vt{u), the inverse function of Ut{v) = va{f{t — v)) in a neighborhood of zero 
independent of t (this can be readily seen, or one can see it in McKay (1993b)). This, together 
with the facts that L has bounded support, a is bounded away from zero and /i2.„ — >■ 0, justifies 
the change of variables hz = [t — s)a{f{t — {t — s))) in the expression of EQi{t), to get (omitting 
the subindex n in the bandwidth). 



L 



«(/(^)) {a'')'{f{s))f{s 



hi 



(a^y(f(t-Vdh2z)))f{t~Vt{h2z)) 



hi 



V — h2Z 



(/(«) - fis))duds 



X / K{y){f{t-Vt{h2z)-yhi)^ J{t-Vt{h2z)))dy\L{z)dz 
Jw J 

F{h2z)G{h2z)L{z)dz. 
Then, using that J L(z)dz ~ J ZiL{z)dz = and expanding, we obtain 



hi ^ ' 2 



/ I E ^KjG + FlG'^ + F'fi[ + FGl^){e{h2z))z,z, \ L{z)dz. 
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Now, F and its partial derivatives are bounded, L is bounded and has bounded support, and, if 
/ e Vc,4^ then, expanding g{t - Vt{h2z) - yhi) ~ g{t - Vt{h2z)), for g = f, /-, /"j, and using the 
symmetry of K, we get that UGHoo, ||G^||oo, ||G-'j||oo are all 0{h1) uniformly in / e Vca- We 
conclude 

sup -^\EQiit)\ = Oihl^hU uniformly in / e Vca, (70) 
and this in turn gives, together with the above application of Talagrand's inequality, 



sup 



Oa.s.(n-^/(«+'')) (71) 



uniformly in / G Vca- 

3.2. Estimation of the variance term (|65p. This term requires [/-processes. Given a 
function H of two variables, and two i.i.d. random variables X and Y such that H{X, Y) is 
integrable, we recall the [/-statistic notation 

^ ' l<i^j<n 

where the variables Xi are i.i.d. copies oi X, as well as the second order Hoeffding projection of 
H{X,Y), 

^2{H){X,Y) = H{X,Y) - ExH{X,Y) - EyH{X,Y)+EH, 

If we set 

Ht{X,Y) := L |^^a(/(X))^ (a'^)'(/(X))i^ {h^) ' ^^^^ 
then (|65p decomposes into a diagonal term and a [/-statistic term, as follows: 



n(n - 1) 7l/l|„ ^ V 'l2,n / 

1 " 

= ^ -y^{Ht{X,,X,) - EyHtiX^Y)) + Un(Ht - EyHti-^Y)) 

n(n — 1) ^-^ 

1 " 

= -Ty2{Ht{X,,X,) - EyHtiX,, Y)) + [/„ (MHti; •))) 

n(n — 1) 

^ ^ z— 1 

1 " 

+ -y^{ExHt{X,X,)-EHt). (73) 

1=1 

These are two empirical process terms and a canonical [/-statistic term. The last term will turn 
out to be the only significant one. 

For the first empirical process in ([75]) . set 

Q,it) = Ht{X,,Xi) - EYHt{X,,Y) 

and observe that, very much as in the simple bounds for moments of Qi in the previous subsection, 

sup sup £'|Qi(t)| < L1/12 „, sup sup i;Qi(t) < L2/12 „, sup sup |Qi(t)| < L3 
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for some finite constants Li = Li(C, c,p, K). So, 



tG 



< 



nhf. 



(74) 



The suprenium part corresponds to the empirical process over the class of functions of x 



Qn 



t ~ X 



aifix)) Ua^ifix)) KiO)-EK 



X- X 



t e 



which, by Lemma [T] is of VC type with respect to a constant envelope and admits characteristic 
constants A and v independent of n and /, just as in the previous subsection for the classes Q„ 
defined by (|69|) . Then, as in this previous instance, Talagrand's inequality (1351) gives that there 
exist Di,D2 > 1 such that 



E sup Pr/ < sup 



> DiJ nh^ ,^ log 71 > < C2 E "^^P (--^2 log 71) < 00, 



which, since '^nh^ j^ logn/(n^/if _„/i2^„) << ((logn)/n)'*/(^+''^ and si 
n/if „ >> (n/logn)'*/(*^+''\ together with yields 



since also 



„2Ud ud 



Y^{Ht{X„X,) - EYHt{X,,Y)) 



3a...((logn)/n)4/(«+'^)) uniformly in / G Vc- 

(75) 

The canonical {/-statistic term in (j73p is best handled by means of an exponential inequality 
of Major (2006). We will state his inequality for bounded VC type classes of functions of two 
variables only. Let T be such a class of functions and let > cr^ > ||Var(/(Xi, X2))||jr. 

Then, if is a uniformly bounded, countable class of VC type, there exist < Ci < 00, 1 < i < 3, 
depending on v and A such that, for all t satisfying 



Ciricrlog ■ 



2IIFII 



< t < 



II 00 



we have 



>< ^ < C2 exp -C: 



(76) 



Major states the theorem for {vr^/} of VC type, but it is easy to see that if T is VC type 
for F then f : / G J-} is VC type for the envelope AF. Our classes J- will be the classes 
{Ht : t G Mf^}. Note that they depend on n via /ii^„, i — 1,2, but we do not display this 
dependence because they are VC type for a fixed constant envelope, admitting characteristic 
constants A and v independent of n: this follows from Lemma [1] with L instead of K, and with 
Q consisting of the single bounded function {a'^)' (f{X))K (tT^) (^^^ Section 3.1, proof that 
the classes defined in are VC). Since, as is easy to check, for / G Vc, 

EH^iX,Y) < M{c,p,C,K)hl^hl^, 
we can take cr^ = M(c,p, C, K)hf „/i2 „ and conclude that there exist Di,D2 > 1 such that 

V sup Pr/isup |C/„(7r2(i/t))| > D^J hf ^h^ „{logn)/n \ < C2 V cxp (-i^2 bgn) < 00. 

„ /:||/llcc<c Um-' V ' ' J „ 
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Since we obtain 

sup .d \d l^»(^2(gt))l = Oa.s.(n-^/('+''') uniformly in / G T'c- (77) 

Having dealt with the first two terms in the last two lines of ([75)1 . we will now handle the 
third and last, namely, 

1 " 

Tit;hi,n,h2,n) ^ —^—^y^{ExHt{X,X,) - EHt) (78) 
or, setting, for ease of notation. 



g{t,x)^ExHt{X,x)^E_ 



X 



(79) 



1 " 

T{t- /ii,„, ;i2,„) = ^ ^ ^(5(i, ^.) " s.g(i, X)). 

'^"■l,n"2.n 



Let be the class of functions {g{t, ■) : t E M''}. We check that this class is of VC type 
and apply Talagrand's inequality once more. We have, for any s,t G R'', and Borel probability 
measure Q, 



EQ{g{t,x) - g{s,x)) 



2 



</i..((a^)'(/(X))i.(^)) Ex[L{t-^^inX)))-L{^aifiX)))) dQix) 

<\\ic^'y\\Uf\\o.hiJK\\lJ (^L(i_l«(/(y)))-L(l-^a(/(2/)))) fiy)dy 

= \\ic^'y\\l\\f\\o.hiJK\\lEfii,~Q' 

where £s and £t are functions from the class C := {L (^^a(/(-))) : t e M*^, > 0} which is VC 
for a constant envelope by Lemma [T] (as L satisfies the hypotheses of K in that lemma -see 
Subsection 3.1-). This lemma then proves that for all Q and for all / G Vc, 



, J/, ^ 8ci+20 

i?(c,p,d,X,C)/if/'^ 



7V(g,L2(Q),£) < — , 0<e<i?(c,p,d,i^,C)/if/„' (80) 

for some R — R{c,p,d, K,C) depending only on the estipulated parameters, in particular, Q is 
VC for the constant envelope Rhf^^ (that depends on n), with characteristic constants A ^ 1 
and V = 8d + 20 independent of n and / G Ve- 
in order to apply Talagarand's (|35|) inequality, we need to estimate Eg^{t,X). With the 
change of variables x ~ t — /ii.nW — /ii.n-z, y = t — h2^nZ, u ^ t — hi_nW — /i2.n-z ^ ^i,nS, of 
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determinant /if''„/i2ni obtain 

<II/IIL02,JI("')'IIL 



t — X 



h2,n 



< Bk 



X Lyza{f{t — h2,nz))jK{s)K{s + w)dsd'wdz 



((t^"^ + z)a{f{t - hi^nW - h2.nZ))) 



(81) 



where the last inequality follows from the bounded support of L and K and a{t) > c. Then, since 
the envelope F of the VC class G can be taken to be Bxhf^^ and cr^ to be BKC^hl'^^h2^„c~^ 
(by ([50)1 and ([5T|) ). for constants B/f that depend only on K, we get by ([M)) and (|551) that there 
exist constants Di, D2 > 1 depending only on K , C, d, p and c such that 



sup Pry ' 

fevc 

and note that 



> Di Jnhf^hi^ log n} <C2 cxp (-i^2 log n) , 



V'^/i?'n/il„l0grV(^i/i?.„'^2,n) = ((l0gn)/n)4/(8+^) 



we get that 



sup \T{t; hi,nh2.n)\ = Oa.s. ([(log n)/n]4/(8+'')) uniformly in / G Vc- 



(82) 



Combining the estimates dTS]), ([771) and dH]) into 1^ yields 



sup 



Y^L ( i^a(/(X,)) ) (a'^)'(/(X,))i5(X,;/ii,„) 



([(logn)A 



i4/(8+d)\ 



(83) 

uniformly in / € Vc- 

Plugging in the estimates (IMl) . (|7T|) and into the decompositions ([57 )) - (pn|) and 

,n, ^2,n ) - /(^; ^2,n), yields: 



Proposition 3 Under Assumptions\^ for any C < 00 the difference between the actual and the 
ideal estimators of a density f satisfies 



sup |/(i; /li,„, /l2,„) - f{t\ /l2,„)| = Oa 



4/(8+rf)^ 



uniformly in / G 



Moreover, 



sup \f{t]hi^ri,h2,n)- f{t;h2,n)-T{t,hi,nh2,n)\ = Oa 



logn\ 



4/(8+d)N 



uniformly in / G T'c,4- 
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3.3. End of the proof of Theorem [TJ Proposition [3] together with the resuhs in Section 2 
for the bias (Corollary [T]) and the variance (Proposition [5]) of the ideal estimator complete the 
proof of the asymptotic estimate ([TT|) in Theorem [1] To prove dT51) . we note that, by (pS)) and 
(HZD, ||/(i;fti,„) - /lU = Oa.s.{{{\ogn)/nf/'^'^+'^'^) uniformly in / e Vc,2, that is, there exists 
A < oo such that 

lim sup Pr<^sup ||/(i»; " /lU > A ^ = 0. (84) 

'=^°°/e-Pc,2 i [n>fc \iogny J 

Since ||/(a;) - /||oo < A((logn)/n)2/(4+'i) implies P;?(w) C Vr as soon as r > A((logn)/n)2/(4+'i), 
follows immediately from pT|) and (IMl) . This concludes the proof of Theorem [TJ 

4 Estimation of densities in C^(R): Proof of Theorem H 

In this section we make the following assumptions on the kernel K , a new kernel G, the clipping 
function p, the densities / and the bandsequences: 

Assumptions 3 We assume that the kernel K is non-negative, bounded and is symmetric about 
zero, has support contained in [—T,T], T < oo, integrates to 1 and has a uniformly bounded 
second derivative. We also assume that the densities f are bounded and have at least six bounded 
derivatives, 

f e Vc.6 ■■= {/ is a density : ||/(^)||oo < C, < fc < 6} (85) 

for some C < oo. We assume that G is a fourth order kernel G supported by [—Tq, Tq\ for some 
Tq < oo, integrates to 1, is symmetric about zero and has two bounded, continuous derivatives. 
The nondecreasing clipping function p : R — !■ M is assumed to have a bounded and continuous 
derivative, p{s) > 1 for all s and p{s) = s for all s > to > 1, where c and to are fixed constants. 
We set /ii,„ = ((logn)/n)i/5, /i2,„ = /^4,„ = ((log n)/n)i/i3, h-i,n = {{logn)/n)^/^\ n € N. 

The ideal estimator we study in this section is as defined in (jl4p , and the corresponding true 
estimator as in (fT6)) . 

By , in order to prove Theorem [5J it suffices to show that the uniform convergence rate 
of the discrepancy between the true estimator (|16p and the ideal estimator is of the same 
order. 

We will use S{x), D{x; hi^n) and b{x;hi^n) as defined in the last section, and note that 
(|45]) . (|46)) . (|47| and (gH) stih hold (with d=l). Following the proof of Theorem 2.3 in Gine and 
Guillou (2002) or Proposition 1 in Gine and Sang (2010), under the conditions on G and / in 
Assumption |31 it is easy to show that 



xSR \ V "^H^n 



sup Ifciix; /i3,n) - EfcA^; h^,n)\ = O 

and 



sup 1/02(2;; ^4, «)- -5/02(2^; /l4,n)| = Oa.s. I a/ ^.r"^'" 

uniformly in / G Vc.fn and classical bias computations with m-th order kernels give 

sMEfGAx-M.n)-nx)\ < ^ (/ G{u)uUv^ \\f^'\\oohin. 
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and 

sup\EfG,{x;h^,n)-f'{x)\ f G{u)u^du) \\f^'^\\o.hl^ 

(recall the definitions of and /g^ from ([T5)) and ([TO))). Define 

i{x) = fcdxihs^n) - f'{x), C{x) = fcAx-.hi^n) " f"{x). 

These estimates give 

sup |^(a;)| = Oa.s.(^"*^"(logn)^/") mriformly in / such that / e Vcfi (86) 

and 

sup |C(x)| = Oa.s.(n-'*/"(logn)'^/i3) uniformly in / such that / G Vcfi, (87) 

£ceR 

and, since /ii^„ is as in Section 3, (pSj) gives that 

sup |,5(x)| = Oa.s.(^"'^'(log")'/') uniformly in / G 7'c.2. 

For /? and /3(a;; /ii.n, ^3.n, /i4,Ti) as defined below (|14p . define p{x) so that 

(1 + p(.t))- 



Then, with some elementary but tedious work, 

^ hlMl + 5{x)f - ASjx) - [f"[x) + ax)]{D + 6) - f{x)ax) + 4f (x)C(x) + 2e(:r)^]} 
24T2cV(c"2/(a;; /ii,n))/T4 + /^l,™ [/g. (a:; Vn)/>; /ii.n) - 2(/gi (o^; /l3,n))2] 

(89) 

where S'(a;) := f"{x)f{x) — 2(/'(x))^, Z? — D{x; /ii,„), 6 = b{x; /ii,„) and is the absolute fc-th 
moment of K. Now, since the denominator is bounded away from zero (p is, and the second 
summand in the denominator tends to zero uniformly in /), the bounds and (HH]), (|T7)) 

give 

sup |p(a;)| = Oa.s.(^^^^"(logn)^/^^) uniformly in / such that / G Vc,6- (90) 

The definition of p allows us to write 

1 - 1 = lip + S + Sp). (91) 

Recall the definitions of the functions L and Li from last section (with d = 1), the definition 
of 7 = 7/i2 „ from (1141) and that of 7 from (jl6p . and note that 7 is bounded above and away from 
zero. We then have 

K (i-2^j{X,; /ii.„, h2.n, h3,n, h^.n)) = K (i—2^j{X,){l + 6{X,) + p{X,) + 5{X,)p{X,)) 
= if (^^7(^0) +K' (^7^7(^0) ^-^l{xmXi) + p{Xi)+5{X,)p{X,)) + 52{t,Xi) 
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where 

h 



52{t,X,) = ^^(^^) ^^(^^)^(^^))2^ (92) 



2,n 

S being a (random) number between ^^7(Xi) and i^^^{Xi){l + 5{Xi)+ p{Xi) + 5{X.i)p{X.i)). 

Then, plugging this development and the development of 7 in the definition of /, we 
obtain the following, where we drop all the arguments for brevity, 

^ n ^ n 

fit; /li,„, h2,n, /l3,«, /i4,n) -f{t; /l2,n) = LSj H r ip7 (93) 



= 1 



1 " 

y [L(5/97 + Li{S + p + Spfj + [1 + S + p + dp)j62] . (94) 

l2,n 



First, we check the order of the term ([M]). Since 7 is bounded above and below, (5 — > a.s. by 

< i?i|S| for some constant Si < 00 and therefore 



t-Xi 



and p — > a.s. by (|90l) . we have 

j'C"(S) is bounded since if" has bounded support. This together with ([55| and (^01) 

(which show p << S) and the definition of 62 in imply that 62{t,Xi) < i?2||/||oo<5^(-^i) for 
some constant B2. Then, again by ((88)) and (|90|. (|94)) is dominated by -^j^ J27=i ^"^1 which has 
order n-4'^/^5(logn)47/65_ Therefore, 

(l9l=Oa.s.(n-'/^'(logn)«/") (95) 



uniformly in t € K and in / e Vcfi- Next, we will check the order of the two terms at the 
right in (j93p . Each of them will require further decompositions. For the first term, using the 
decomposition (|49|) of 5, we have, just as in ((65|) - (|67|) . 



' i—i ' t—i ^ 

1 ^/t~X,..\a'{f{X,))b{X,;hi^n 



nh2,n^^ \ /l2,n / q;(/(Aj)) 

n/l2,n V "•i.n / a{f(Xi)) 

Since the functions L{x),a" {ri{x)) and 7(2;) are bounded and the clipping function p{x) is 
bounded away from zero, it can be easily seen, using (|47l) . that is dominated by 

= Oa.s. = Oa.s. 1^ ^ ^ uniformly on M and in / G 7'c.2. (99) 

As with the bounds for C^(R) densities, the main terms in the present decomposition are (j96p 
and (j97p . and they can be handled as in the previous section, basically using the Talagrand 
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and Major inequalities: the bounds obtained have the same expressions as those bounds in 
terms of the bandwidths, which now of course are different. For the bias term, the bound is 
of the order of h\^h\^ — ilogn) /nf^/^^ , see ([7T|) . For the variance term, it is of the order 
of (log7i)/n)i/2/i^y^ = (n/logn)-6/i3 (gge ([TS]), ^ and particularly, ^). So, we have, 
uniformly in f G R and in / e 7^0,2, 



logn 



6/l3^ 



931 =Oa 



log n 



6/l3^ 



(100) 



Finally, we check the order of the second term in (1^1) . X^iLi ^Pl- We give the details 

for the estimation of this term because it is different from the previous case. In the definition 
(IMl), by assumption H (gSD, (gT]), (EH) and dHT]), sup^gR |/g, (x; /i4,„)/(a;; /ii,„) - 2(/gi (x; /i3,n))^| 
is bounded almost surely. Hence, since p > 1, we have, for the denominator de{p) of p, 

inf Me(p)| 

:= inf |24t2cV(c~VV; /ii.rO)/^4 + hj Ja^i^; hi,n)f{x; /ii.„) - 2hl MgAx; /J3,n))^| 
> Bd 

almost surely, for some universal constant Bd > if n is large enough. Therefore, almost surely. 



sup 

tGK 



1 " 

—r — y^^p^ 

nh2,„ ^ 



, '12, n \ ^ 

< sup > 

- Bdu ^ 



"2,n \ ^ 

tew Bdfi ^ 



"■2,n \ ^ 



h2,n 

sup -— 

tGR i'dfl 



E 

i=l 



1 n 

■ ^^P ~5~ 1^ 

tgR Ban ^ 



L 



2L 



t~ X, 

h2,n 

t-x, 

h2,n 

t-x, 

h2,n 

t-x. 

h2,n 

t-x. 



Mx^)j-/ixMi + S{x,))'-i]six,) 

7(X,)^ j{X,)[f"{X,) + C{Xi)][D{X,;hi,n) + b{X,;hi,n)] 
j{X,)) j{X,)f{X,)aX, 



h2,n 



hix,)j ^{x,)fix,)ax,) 
jix,)) i{Xi)ax, f 



(101) 
(102) 
(103) 
(104) 
(105) 



Since the functions L{x), 7(2;), f{x),f'{x) and f"{x) are bounded, it follows that (jlOip 
and p02D are of the order Oa.s.([(logn)/n]^^/^^) = Oa.s.([(logn)/n]^/^^) uniformly in t € M and 
/ G 7^c,2, the first by the estimate (l88l) and the second by the classical bounds (|46)) and (l47l) . 
It also follows from dHU), that ([TU5]) is Oa.s.([(logn)/n]^/"+i/i3) uniformly in t and /. Now we 
estimate the term (|103l) . If a class of functions is VC type, so is the class of its absolute values 
(covering numbers are smaller) , hence. Lemma [T] shows that the classes of functions 



L 



t — X 



h2,n 



7(x) 7(a;)/(x) 



t e 



(106) 



are of VC type for envelopes of the order of 
constants A an v. If we set 



3/2 



0(1) and admitting the same characteristic 



L 



t-x. 



7(X,;) 7(^.)/(^0 



26 



it then follows by the properties of L and p, that 



sup 



EN,{t) < \\f\\li'h2,n, snpEN^t) < \\f\th,.^n, sup N,{t) < \\f\t^, 



So, we have 



"•2,n \ ^ 



t-X,, 



< sup 



■sup/i2,„|£;A^i(i)|/Bd 



ten 



< 



sup 



'■2,n 

n 



i=l 



+ ||/||3/2n-2/13(log„)2/13^ 



and Talagrand's inequality gives that there exist Di,D2 > 1 such that 



y sup Pr/ < sup 



^[iV,(t)-i?7V,(<)] 



> Diyjnh2,n logn > < C2 ^exp logn) < 00. 



The last two estimates yield 



h{X,) i{X,).f{Xi) 



0^.sXn''^'^^{\ognfl^'^) uniformly in feVc.,: 



^2/13 



Combining this with the bound ((57| for C, gives 



^2,n \ ^ 



^2.n 



7(X,) 7(X,)/(X,)C(X,) 



< sup |C(a;)| X sup ^ V L [-r-^ 

xGK tGR i^dn ^ V ft2,n 

= Oa.s.([(logn)/n]6/i3) a.s. 

Note that „ n^^/^'^(log n)^/^'^ and /i4,„ ^ n~^/^3(log n)^/^'^ play critical roles in this esti- 
mation. The same argument produces the same bound for the term (|104p . 

So, we have shown that each of the three terms in the decomposition ((93|) and (l94)) of 
/(t;/ii,„,/i2,„,/i3,n,/i4,n) " ^2,«) IS at most of the order Oa.s.([(logn)/n]S/i3) uniformly in 
t and /, that is. 

Proposition 4 Under the Assumptions\^ for any C < 00 we have: 

6/13\ 



sup 

tea 



,m ^2,n, ^3,Ti, ^4,n) fjKH{t] ^2,ri) 



logn 



uniformly in / € 7^c,2- 



Combining this proposition with the estimates of the bias and variance terms of the ideal 
estimator, respectively Corollary [5] and Proposition [51 yields Theorem [21 
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